101 research outputs found

    Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS

    Get PDF
    The roofline model not only provides a powerful tool to relate an application\u27s performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing the data in registers before/after invoking store/load memory operations.) In practice, we demonstrate that a “memory accessor” that hides the data compression behind the memory access, can virtually push the bandwidth-induced roofline, yielding higher performance for memory-bound applications using high precision arithmetic that can handle the numerical effects associated with lossy compression. We also demonstrate that memory-bound applications operating on low precision data can increase the accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, we demonstrate that memory-bound BLAS operations (including the sparse matrix-vector product) can be re-engineered with the memory accessor and that the resulting accessor-enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low precision BLAS

    Factorized Solution of Generalized Stable Sylvester Equations Using Many-Core GPU Accelerators

    Get PDF

    Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors

    Get PDF
    Many practical algorithms for numerical rank computations implement an iterative procedure that involves repeated multiplications of a vector, or a collection of vectors, with both a sparse matrix AA and its transpose. Unfortunately, the realization of these sparse products on current high performance libraries often deliver much lower arithmetic throughput when the matrix involved in the product is transposed. In this work, we propose a hybrid sparse matrix layout, named CSRC, that combines the flexibility of some well-known sparse formats to offer a number of appealing properties: (1) CSRC can be obtained at low cost from the popular CSR (compressed sparse row) format; (2) CSRC has similar storage requirements as CSR; and especially, (3) the implementation of the sparse product kernels delivers high performance for both the direct product and its transposed variant on modern graphics accelerators thanks to a significant reduction of atomic operations compared to a conventional implementation based on CSR. This solution thus renders considerably higher performance when integrated into an iterative algorithm for the truncated singular value decomposition (SVD), such as the randomized SVD or, as demonstrated in the experimental results, the block Golub–Kahan–Lanczos algorithm

    Compressed basis GMRES on high-performance graphics processing units

    Get PDF
    Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is constrained by the communication bandwidth in current computer architectures, motivating the investigation of sophisticated techniques to avoid, reduce, and/or hide the message-passing costs (in distributed platforms) and the memory accesses (in all architectures). This article leverages Ginkgo’s memory accessor in order to integrate a communication-reduction strategy into the (Krylov) GMRES solver that decouples the storage format (i.e., the data representation in memory) of the orthogonal basis from the arithmetic precision that is employed during the operations with that basis. Given that the execution time of the GMRES solver is largely determined by the memory accesses, the cost of the datatype transforms can be mostly hidden, resulting in the acceleration of the iterative step via a decrease in the volume of bits being retrieved from memory. Together with the special properties of the orthonormal basis (whose elements are all bounded by 1), this paves the road toward the aggressive customization of the storage format, which includes some floating-point as well as fixed-point formats with mild impact on the convergence of the iterative process. We develop a high-performance implementation of the “compressed basis GMRES” solver in the Ginkgo sparse linear algebra library using a large set of test problems from the SuiteSparse Matrix Collection. We demonstrate robustness and performance advantages on a modern NVIDIA V100 graphics processing unit (GPU) of up to 50% over the standard GMRES solver that stores all data in IEEE double-precision

    Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

    Get PDF
    © ACM, YYYY. This is the author's version of the work "Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., ... & Quintana-Ortí, E. S. (2022). Ginkgo: A modern linear operator algebra framework for high performance computing. ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33". It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {VOL48, ISS 1, (MAR 2022)} http://doi.acm.org/10.1145/3480935"[EN] In this article, we present GINKGO, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Gnswo's design principle abstracts all functionality as linear operators," motivating the notation of a "linear operator algebra library" GINKGO'S current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate GINKGO'S usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of GINKGO'S high performance on state-of-the-art GPU architectures.This work was supported by the "Impuls und Vernetzungsfond of the Helmholtz Association" under grant VH-NG-1241. G. Flegar and E. S. Quintana-Orti were supported by project TIN2017-82972-R of the MINECO and FEDER and the H2020 EU FETHPC Project 732631 "OPRECOMP". This researchwas also supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The experiments on the NVIDIA A100 GPU were performed on the HAICORE@KIT partition, funded by the "Impuls und Vernetzungsfond" of the Helmholtz Association. The experiments on the AMD MI100 GPU were performed on Tulip, an early-access platform hosted by HPE.Anzt, H.; Cojean, T.; Flegar, G.; Göbel, F.; Grützmacher, T.; Nayak, P.; Ribizel, T.... (2022). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. ACM Transactions on Mathematical Software. 48(1):1-33. https://doi.org/10.1145/348093513348

    Parallel computation of 3-D soil-structure interaction in time domain with a coupled FEM/SBFEM approach

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10915-011-9551-xThis paper introduces a parallel algorithm for the scaled boundary finite element method (SBFEM). The application code is designed to run on clusters of computers, and it enables the analysis of large-scale soil-structure-interaction problems, where an unbounded domain has to fulfill the radiation condition for wave propagation to infinity. The main focus of the paper is on the mathematical description and numerical implementation of the SBFEM. In particular, we describe in detail the algorithm to compute the acceleration unit impulse response matrices used in the SBFEM as well as the solvers for the Riccati and Lyapunov equations. Finally, two test cases validate the new code, illustrating the numerical accuracy of the results and the parallel performances. © Springer Science+Business Media, LLC 2011.Jose E. Roman and Enrique S. Quintana-Orti were partially supported by the Spanish Ministerio de Ciencia e Innovacion under grants TIN2009-07519, and TIN2008-06570-C04-01, respectively.Schauer, M.; Román Moltó, JE.; Quintana Orti, ES.; Langer, S. (2012). Parallel computation of 3-D soil-structure interaction in time domain with a coupled FEM/SBFEM approach. Journal of Scientific Computing. 52(2):446-467. doi:10.1007/s10915-011-9551-xS446467522Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia (1992)Antes, H., Spyrakos, C.: Soil-structure interaction. In: Beskos, D., Anagnotopoulos, S. (eds.) Computer Analysis and Design of Earthquake Resistant Structures, p. 271. Computational Mechanics Publications, Southampton (1997)Appelö, D., Colonius, T.: A high-order super-grid-scale absorbing layer and its application to linear hyperbolic systems. J. Comput. Phys. 228(11), 4200–4217 (2009)Astley, R.J.: Infinite elements for wave problems: a review of current formulations and a assessment of accuracy. Int. J. Numer. Methods Eng. 49(7), 951–976 (2000)Balay, S., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.1, Argonne National Laboratory (2010)Benner, P.: Contributions to the numerical solution of algebraic Riccati equations and related eigenvalue problems. Dissertation, Fak. f. Mathematik, TU Chemnitz–Zwickau, Chemnitz, FRG (1997)Benner, P.: Numerical solution of special algebraic Riccati equations via an exact line search method. In: Proc. European Control Conf. ECC 97, Paper 786, BELWARE Information Technology, Waterloo (B) (1997)Benner, P., Quintana-Ortí, E.: Solving stable generalized Lyapunov equations with the matrix sign function. Numer. Algorithms 20(1), 75–100 (1999)Benner, P., Byers, R., Quintana-Ortí, E., Quintana-Ortí, G.: Solving algebraic Riccati equations on parallel computers using Newton’s method with exact line search. Parallel Comput. 26(10), 1345–1368 (2000)Benner, P., Quintana-Ortí, E.S., Quintana-Ortí, G.: Solving linear-quadratic optimal control problems on parallel computers. Optim. Methods Softw. 23(6), 879–909 (2008)Bettess, P.: Infinite Elements. Penshaw Press, Sunderland (1992)Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)Borsutzky, R.: Braunschweiger Schriften zur Mechanik - Seismic Risk Analysis of Buried Lifelines, vol. 63. Mechanik-Zentrum Technische Universität. Braunschweig (2008)Dongarra, J.J., Whaley, R.C.: LAPACK working note 94: A user’s guide to the BLACS v1.1. Tech. Rep. UT-CS-95-281, Department of Computer Science, University of Tennessee (1995)Engquist, B., Majda, A.: Absorbing boundary conditions for the numerical simulation of waves. Math. Comput. 31(139), 629–651 (1977)Granat, R., Kågström, B.: Algorithm 904: The SCASY library – parallel solvers for Sylvester-type matrix equations with applications in condition estimation, part II. ACM Trans. Math. Softw. 37(3), 33:1–33:4 (2010)Guerrero, D., Hernández, V., Román, J.E.: Parallel SLICOT model reduction routines: The Cholesky factor of Grammians. In: Proceedings of the 15th Triennal IFAC World Congress, Barcelona, Spain (2002)Harr, M.E.: Foundations of Theoretical Soil Mechanics. McGraw-Hill, New York (1966)Hilbert, H., Hughes, T., Taylor, R.: Improved numerical dissipation for time integration algorithms in structural dynamics. Earthquake Eng. Struct. Dyn. 5, 283 (1977)Kleinman, D.: On an iterative technique for Riccati equation computations. IEEE Trans. Autom. Control AC-13, 114–115 (1968)Lehmann, L.: Wave Propagation in Infinite Domains. Springer, Berlin (2006)Lehmann, L., Langer, S., Clasen, D.: Scaled boundary finite element method for acoustics. J. Comput. Acoust. 14(4), 489–506 (2006)Liao, Z.P., Wong, H.L.: A transmitting boundary for the numerical simulation of elastic wave propagation. Soil Dyn. Earthq. Eng. 3(4), 174–183 (1984)Lysmer, J., Kuhlmeyer, R.L.: Finite dynamic model for infinite media. J. Eng. Mech. 95, 859–875 (1969)Meskouris, K., Hinzen, K.G., Butenweg, C., Mistler, M.: Bauwerke und Erdbeben - Grundlagen - Anwendung - Beispiele. Vieweg Teubner, Wiesbaden (2007)MPI Forum: The message passing interface (MPI) standard (1994). http://www.mcs.anl.gov/mpiNewmark, N.: A method of computation for structural dynamics. J. Eng. Mech. Div. 85, 67 (1959)Petersen, C.: Dynamik der Baukonstruktionen. Vieweg/Sohn Verlagsgesellschaft, Braunschweig (2000)Roberts, J.: Linear model reduction and solution of the algebraic Riccati equation by use of the sign function. Int. J. Control 32, 677–687 (1980)Schauer, M., Lehmann, L.: Large scale simulation with scaled boundary finite element method. Proc. Appl. Math. Mech. 9, 103–106 (2009)Wolf, J.: The Scaled Boundary Finite Element Method. Wiley, Chichester (2003)Wolf, J., Song, C.: Finite-Element Modelling of Unbounded Media. Wiley, Chichester (1996
    corecore